首页> 外文OA文献 >BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data

【2h】

BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data

机译：BlinkDB：有限错误的查询和有限的响应时间大数据

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In this paper, we present BlinkDB, a massively parallel, sampling-basedapproximate query engine for running ad-hoc, interactive SQL queries on largevolumes of data. The key insight that BlinkDB builds on is that one can oftenmake reasonable decisions in the absence of perfect answers. For example,reliably detecting a malfunctioning server using a distributed collection ofsystem logs does not require analyzing every request processed by the system.Based on this insight, BlinkDB allows one to trade-off query accuracy forresponse time, enabling interactive queries over massive data by runningqueries on data samples and presenting results annotated with meaningful errorbars. To achieve this, BlinkDB uses two key ideas that differentiate it fromprevious work in this area: (1) an adaptive optimization framework that buildsand maintains a set of multi-dimensional, multi-resolution samples fromoriginal data over time, and (2) a dynamic sample selection strategy thatselects an appropriately sized sample based on a query's accuracy and/orresponse time requirements. We have built an open-source version of BlinkDB andvalidated its effectiveness using the well-known TPC-H benchmark as well as areal-world analytic workload derived from Conviva Inc. Our experiments on a 100node cluster show that BlinkDB can answer a wide range of queries from areal-world query trace on up to 17 TBs of data in less than 2 seconds (over100\times faster than Hive), within an error of 2 - 10%.

机译：在本文中，我们介绍了BlinkDB，这是一种大规模并行，基于采样的近似查询引擎，用于对大量数据运行临时的交互式SQL查询。 BlinkDB建立的关键见解是，在没有完美答案的情况下，人们常常可以做出合理的决定。例如，使用分布式系统日志可靠地检测服务器故障并不需要分析系统处理的每个请求。基于此见解，BlinkDB可以权衡查询准确性以响应时间，从而通过运行查询来对海量数据进行交互式查询在数据样本上显示结果，并用有意义的误差条注释。为了实现这一目标，BlinkDB使用了两个关键思想将其与该领域的先前工作区分开：（1）自适应优化框架，该框架可以根据原始数据构建并维护一组多维，多分辨率的样本，以及（2）动态的样本选择策略，可根据查询的准确性和/或响应时间要求选择适当大小的样本。我们已经构建了BlinkDB的开源版本，并使用著名的TPC-H基准以及Conviva Inc.提供的区域世界分析工作负载验证了其有效性。我们在100node集群上进行的实验表明BlinkDB可以回答各种各样的问题。来自区域世界查询的查询可在不到2秒的时间内跟踪多达17 TB的数据（比Hive快100倍以上），误差在2-10％之内。

著录项

作者
Agarwal, Sameer; Panda, Aurojit; Mozafari, Barzan; Madden, Samuel; Stoica, Ion;
展开▼
作者单位

展开▼
年度 2012
总页数
原文格式 PDF
正文语种 {"code":"en","name":"English","id":9}
中图分类

相似文献

外文文献
中文文献
专利

1. BlinkDB: queries with bounded errors and bounded response times on very large data [J] . Mohamed Eltabakh Computing reviews . 2014,第3期

机译：BlinkDB：对非常大的数据具有有限错误和有限响应时间的查询
2. Parameter bounds for discrete-time Hammerstein models with bounded output errors [J] . Cerone V., Regruto D. IEEE Transactions on Automatic Control . 2003,第10期

机译：具有有限输出误差的离散时间Hammerstein模型的参数范围
3. Bounded similarity querying for time-series data [J] . Goldin DQ, Millstein TD, Kutlu A Information and computation . 2004,第2期

机译：有界相似性查询时间序列数据
4. Approximate Query Processing Using Wavelets in OLAP with Arbitrarily Sized Data and Bounded Errors [C] . A. Ukharov, A. Burdakov, U. Grigorev, Euromicro International Conference on Parallel, Distributed, and Network-Based Processing . 2016

机译：使用任意大小的数据和有界错误的OLAP中的小波进行近似查询处理
5. Queries with Bounded Errors & Bounded Response Times on Very Large Data. [D] . Agarwal, Sameer. 2014

机译：对非常大的数据具有有限错误和有限响应时间的查询。
6. Quadrant-Based Minimum Bounding Rectangle-Tree Indexing Method for Similarity Queries over Big Spatial Data in HBase [O] . Bumjoon Jo, Sungwon Jung 2018

机译：HBase中大空间数据相似性查询的基于象限的最小边界矩形树索引方法
7. BlinkDB: queries with bounded errors and bounded response times on very large data [O] . Agarwal Sameer, Mozafari Barzan, Panda Aurojit, 2013

机译：BlinkDB：对非常大的数据进行有界错误和有限响应时间的查询

BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅